1 rANOMALY step-by-step use case.

1.1 Help

Each function have a detailed help accessible in R via ?{funtion}.

1.2 Tests datasets

The dataset can be downloaded via this link.

This tutorial assume that you have extracted all the read file in a folder named reads along with the sample-metadata.csv file.

We share a 24 samples test dataset extract from rats feces at two different time (t0 & t50) and in two nutrition conditions. Also included two extraction control sample (blank).

sm <- read.table("sample_metadata.csv", sep="\t",header=TRUE)
DT::datatable(sm)
load("decontam_out/robjects.Rdata")

1.3 ASV definition with DADA2

The first step will be the creation of ASVs (Amplicon Sequence Variants) thanks to the dada2 package. In rANOMALY, only one function is needed to compute all the different steps require from this package.

Sample names will be extracted from the file name, so files must be formatted as followed : {sample-id1}_R1.fastq.gz {sample-id1}_R2.fastq.gz etc…

dada_res = dada2_fun(path="./reads", dadapool = "pseudo", compress=TRUE, plot=FALSE)

Main output: - read_tracking.csv that summarize the read number after each filtering step.

DT::datatable(read.table("dada2_out/read_tracking.csv",sep="\t",header=TRUE))

The sample names extracted from the file name. We consider as sample name anything that is before the first underscore. This must match the sample names that are in sample metadata files. input: raw read number. filtered: after dada2 filtering step: no N’s in sequence, low quality, and phiX. denoisedF & denoisedR: after denoising. Forward & Reverse. merged: after merging R1 & R2. nonchim: after chimeras filtering.

  • dada2_robjects.Rdata with raw ASV table and representative sequences in objects otu.table, seqtab.export & seqtab.nochim.
  • raw_asv-table.csv
  • rep-seqs.fna

1.4 Taxonomic assignment

This function uses IDTAXA function from DECIPHER package, and allows to use 2 differents databases. It keeps the best assignation on 2 criteria, resolution (depth) and confidence. The final taxonomy is validated by multiple ancestors taxa and incongruity correction step.

We share the latest databases we use in the IDTAXA format in this link. You can also generate your own database following those instructions and scripts we provide in another repository.

tax.table = assign_taxo_fun(dada_res = dada_res, id_db = c("path_to_your_banks/silva/SILVA_SSU_r132_March2018.RData","path_to_your_banks/DAIRYdb_v1.2.0_20190222_IDTAXA.RData") )

Main output: - taxo_robjects.Rdata with taxonomy in phyloseq format in tax.table object. - final_tax_table.csv the final assignation table that will be use in next steps. - allDB_tax_table.csv raw assignations from the two databases, mainly for debugging.

1.5 Phylogenetic Tree

The phylogenetic tree from the representative sequences is generated using phangorn and DECIPHER packages.

tree = generate_tree_fun(dada_res)

Main output: - tree_robjects.Rdata with phylogenetic tree object in phyloseq format.

1.6 Phyloseq object

To create a phyloseq object, we need to merge four objects and one file: - the asv table otu.table and the representative sequences seqtab.nochim from dada2_robjects.Rdata - a taxonomy table taxo_robjects.Rdata from taxo_robjects.Rdata - the phylogenetic tree tree from tree_robjects.Rdata - metadata from sample-metadata.csv

data = generate_phyloseq_fun(dada_res = dada_res, taxtable = tax.table, tree = tree, metadata = "./sample_metadata.csv")

Main output: - robjects.Rdata with phyloseq object in data for raw counts and data_rel for relative abundance.

1.7 Decontamination

The decontam_fun function uses decontam R package with control samples to filter contaminants. The decontam package offers two main methods, frequency and prevalence (and then you can combine those methods). For frequency method, it is mandatory to have the dna concentration of each sample in phyloseq (and hence in the sample-metadata.csv). “In this method, the distribution of the frequency of each sequence feature as a function of the input DNA concentration is used to identify contaminants.” In the prevalence methods no need of DNA quantification. “In this method, the prevalence (presence/absence across samples) of each sequence feature in true positive samples is compared to the prevalence in negative controls to identify contaminants.

Tips: sequencing plateforms often quantify the DNA before sequencing, but do not automaticaly give the information. Just ask for it ;).

Our function integrates the basics ASV frequency (nb_reads_ASV/nb_total_reads) and prevalence (nb_sample_ASV/nb_total_sample) filtering. As in our lab we had a known recurrent contaminant we included an option to filter out ASV based on they taxa names.

data = decontam_fun(data = data, domain = "Bacteria", column = "type", ctrl_identifier = "control", spl_identifier = "sample", number = 100)

Main output: - robjects.Rdata with contaminant filtered phyloseq object named data. - Exclu_out.csv list of filtered ASVs for each filtering step. - Kronas before and after filtering. - raw_asv-table.csv & relative_asv-table.csv. - venndiag_filtering.png.

venndiag

1.8 Plots, diversity and statistics

!!! We are currently developping a ShinyApp to visualize your data, sub-select your samples/taxons and do all those analyses interactively !!! ExploreMetabar

1.8.0.1 Beta diversity

# beta <- diversity_beta_light(data = data, output = "./plot_div_beta/", glom = "ASV", column1 = "temps", column2 = "souche", covar ="")
beta = diversity_beta_light(data, col = "souche", cov="temps", dist0 = "bray", ord0 = "MDS", output="./plot_div_beta/", tests = TRUE)
beta2 = diversity_beta_light(data, col = "souche_temps", dist0 = "bray", ord0 = "MDS", output="./plot_div_beta/", tests = TRUE)
1.8.0.1.1 Ordination plot
ggplotly(beta$plot)
ggplotly(beta2$plot)
1.8.0.1.2 Statistical tests
  • permanova
beta$permanova
## 
## Call:
## adonis(formula = as.formula(paste("dist1 ~ Depth +", paste(cov1,      collapse = "+"), "+", col)), data = mdata, permutations = 1000) 
## 
## Permutation: free
## Number of permutations: 1000
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2   Pr(>F)    
## Depth      1    0.5075 0.50751  3.1218 0.06845 0.005994 ** 
## temps      1    2.1846 2.18458 13.4380 0.29463 0.000999 ***
## souche     1    1.4711 1.47112  9.0493 0.19841 0.000999 ***
## Residuals 20    3.2514 0.16257         0.43851             
## Total     23    7.4146                 1.00000             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • pairwise permanova on concatenated factors
pander(beta2$pairwisepermanova)
Table continues below
pairs Df SumsOfSqs F.Model R2 p.value
sauvage_t0 vs mutant_t0 1 0.9528 4.64 0.317 0.005
sauvage_t0 vs sauvage_t50 1 2.021 28.97 0.7434 0.003
sauvage_t0 vs mutant_t50 1 2.197 26 0.7223 0.002
mutant_t0 vs sauvage_t50 1 1.681 11.59 0.5369 0.005
mutant_t0 vs mutant_t50 1 1.57 9.826 0.4956 0.003
sauvage_t50 vs mutant_t50 1 1.818 75.22 0.8827 0.007
p.adjusted sig
0.006 *
0.006 *
0.006 *
0.006 *
0.006 *
0.007 *

1.8.1 Differential analyses

1.8.1.1 Metacoder

out1 = metacoder_fun(data = data, output = "./metacoder", column1 = "souche_temps", rank = "Family", signif = TRUE, plottrees = TRUE, min ="10", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
  • Table
 DT::datatable(out1$table, filter = "top", options = list(scrollX = TRUE))
  • Comparaison 1
 out1$sauvage_t0_vs_mutant_t0$plot

  • Comparaison 2
 out1$sauvage_t50_vs_mutant_t50$plot

1.8.1.2 DESeq2

#fig.keep='all', fig.align='left', fig.width = 15, fig.height = 10
out2 = deseq2_fun(data = data, output = "./deseq/", column1 = "souche_temps", verbose = 1, rank = "Family", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(out2$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(out2$sauvage_t50_vs_mutant_t50$table, filter = "top", options = list(scrollX = TRUE))

1.8.1.3 MetagenomeSeq

out3 = metagenomeseq_fun(data = data, output = "./metagenomeseq/", column1 = "souche_temps", verbose = 1, rank = "Family", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(out3$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(out3$sauvage_t50_vs_mutant_t50$table, filter = "top", options = list(scrollX = TRUE))

1.8.1.4 Aggregate methods

resF = aggregate_fun(data = data, metacoder = "./metacoder/metacoder_signif_Family.csv", deseq = "./deseq/", mgseq = "./metagenomeseq/", output = "./aggregate_diff/", column1 = "souche_temps", column2 = NULL, verbose = 1, rank = "Genus", comp = "sauvage_t50~mutant_t50,sauvage_t0~mutant_t0")
ggplotly(resF$sauvage_t0_vs_mutant_t0$plot)
ggplotly(resF$sauvage_t50_vs_mutant_t50$plot)
DT::datatable(resF$table, filter = "top", options = list(scrollX = TRUE))